Machine learning models to predict systemic inflammatory response syndrome after percutaneous nephrolithotomy

Objective The objective of this study was to develop and evaluate the performance of machine learning models for predicting the possibility of systemic inflammatory response syndrome (SIRS) following percutaneous nephrolithotomy (PCNL). Methods We retrospectively reviewed the clinical data of 337 patients who received PCNL between May 2020 and June 2022. In our study, 80% of the data were used as the training set, and the remaining data were used as the testing set. Separate prediction models based on the six machine learning algorithms were created using the training set. The predictive performance of each machine learning model was determined by the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity and specificity using the testing set. We used coefficients to interpret the contribution of each variable to the predictive performance. Results Among the six machine learning algorithms, the support vector machine (SVM) delivered the best performance with accuracy of 0.868, AUC of 0.942 (95% CI 0.890–0.994) in the testing set. Further analysis using the SVM model showed that prealbumin contributed the most to the prediction of the outcome, followed by preoperative urine culture, systemic immune-inflammation (SII), neutrophil to lymphocyte ratio (NLR), staghorn stones, fibrinogen, operation time, preoperative urine white blood cell (WBC), preoperative urea nitrogen, hydronephrosis, stone burden, sex and preoperative lymphocyte count. Conclusion Machine learning-based prediction models can accurately predict the possibility of SIRS after PCNL in advance by learning patient clinical data, and should be used to guide surgeons in clinical decision-making.


Introduction
Kidney stones are one of the most common urological diseases, and its prevalence is reported to be increasing worldwide [1].According to reports, Chinese adults have a 5.8% incidence rate of renal calculi, with approximately 1 in 17 adults currently diagnosed [1,2].Since the first report on percutaneous nephrolithotomy (PCNL) in 1976, it has gradually become the standard of care for patients with calculi larger than 2 cm, multiple or staghorn [3].
PCNL has certain advantages including minimal trauma, high stone clearance rates, short hospital stays and quick recovery.However, it also has several complications after surgery.Systemic inflammatory response syndrome (SIRS) is a common and serious complication associated with PCNL, with an incidence rate of 16.7-27.4%[4].A urosepsis incidence between 0.3% and 4.7% can develop from postoperative SIRS if not diagnosed and treated early [5].When sepsis progresses to septic shock or multiple organ failure, it causes high mortality rates and increased treatment costs.
Machine learning algorithms have increasingly been used to aid diagnosis, treatment, and automatic classification in medicine as statistical theory and computer technology have developed [6].In the past studies, machine learning models has been uesd to predict the acute kidney injury after nephrectomy in patients with renal cell carcinoma and showed good predictive performance [7].Machine learning algorithms was used to develop to predict the risk of incontinence after robotassisted radical prostatectomy [8].Due to the fact that machine learning algorithms show great potential in processing complex data sets, we should develop an efficient prediction model based on machine learning to identify patients with potential risk factors for post-PCNL SIRS and closely monitor their vital signs after surgery, which can significantly reduce the burden of false alarms.The main objective of this study was to analyze the influencing factors of SIRS after PCNL.Multiple machine learning algorithms were used to construct and verify the prediction model of SIRS after surgery.The performance of each machine learning model is compared, and the optimal prediction model is proposed.

Patients
We retrospectively collected the clinical data of 337 patients who underwent PCNL in the urology department of the affiliated hospital of Qingdao University between May 2020 and June 2022 by a single surgeon.Patients were excluded from the analysis based on the following criteria: (a) a history of bilateral PCNL; (b) the presence of patients with tumors, blood system or immune system diseases; (c) congenital deformities such as polycystic kidney, horseshoe kidney and solitary kidney; and (d) missing data.This study complied with the principles of the Declaration of Helsinki and was conducted in accordance with the ethical standards of the medical ethics committee of our institution.The patient selection flow diagram was shown in Fig. 1.

Feature selection
Firstly, we conducted the statistical t-test, Mann-Whitney test and chi-square test for clinical characteristics to determine whether there are differences between the groups.Our analysis considered the feature significant when its two-tailed p-value was p < 0.05.Secondly, spearman correlation analysis was performed to reduce collinearity among features.To reduce the risk of overfitting, the least absolute shrinkage and selection operator (LASSO) algorithm was applied to select features with non-zero coefficient values.

Model building
To predict the occurrence of post-PCNL SIRS, we used six machine learning algorithms, including the support vector machine (SVM), light gradient boosting machine (LightGBM), eXtreme gradient boosting (XGBoost), k-nearest neighbor (KNN), random forest (RF) and Extra-Trees classifier.These algorithms cover a wide range of modeling methods, ensuring that we can capture complex patterns in our data and improve the accuracy of our predictions.SVM is suitable for highdimensional data and small samples, and has good classification effect.LightGBM and XGBoost are integrated methods based on decision tree, which have excellent performance and fast training speed when dealing with complex data sets.As a simple nonparametric method, KNN is easy to understand and apply.Random forest and Extra-Trees enhance the robustness and stability of the model through the integration of multiple decision trees.Choosing these algorithms allows us to comprehensively evaluate the performance of different models in predicting SIRS, ensuring that we get the best prediction results.
The data were randomly categorized into training set (80%) and testing set (20%).The training set includes 216 patients without SIRS and 53 patients with SIRS, while the testing set includes 54 patients without SIRS and 14 patients with SIRS.The training set was used to establish the prediction models using five-fold cross-validation, whereas the testing set was used to validate the prediction models using the area under the curve (AUC) of the receiver operating characteristics (ROC).We calculated the correlation coefficient between features which was used to visualize the contribution of each feature to the model predictions.

Statistical analysis
Continuous variables with normal distributions were presented as mean ± standard deviation (SD) and compared using Student's t-test.Continuous variables with non-normal distributions were presented as medians with interquartile ranges and compared using the Mann-Whitney test.Categorical variables were expressed as frequencies with proportions and compared using the chi-square test.The machine learning models were written in Python 3.7 language.LASSO algorithm and correlation analysis were implemented by importing the "scipy", "numpy", and "sklearn" packages in Python (version 3.7), and were performed using the "One-key AI" platform (http://www.medai.icu/),which was based on Python (version 3.7).The code used in this study was derived from: https://gitee.com/wangqingbaidu/Onekey-Compo.The area under the curve (AUC) was used to evaluate the predictive effectiveness of models, and DeLong test was used to compare whether the efficiency differences between the models were statistically significant.A bilateral P-value < 0.05 was considered as a measure of statistical significance.

Patient characteristics
The study included 337 patients who underwent PCNL and had complete medical records.Among these, 69 patients experienced SIRS.The patients were divided into two groups based on whether SIRS occurred after PCNL.The age, sex, preoperative N, preoperative L, preoperative PLT, preoperative Hb, uric acid, serum albumin, serum fibrinogen, serum prealbumin, NLR, PLR, LMR, SII, operation time, stone burden, urine WBC, urine culture, and staghorn stones were significantly different between the two groups.The baseline data of the included patient are shown in Table 1.

Feature selection and model building
Using spearman correlation analysis and the lasso algorithm with fivefold cross-validation (Fig. 2A, B), the 27 variables were ultimately reduced to 13 potential predictors of post-PCNL SIRS risk (Table 2), which were incorporated into the construction of the predictive model in our study.The training set and testing set consisted of 80% and 20% databases, respectively.Six machine learning algorithms were used to establish the prediction models in the training set, and the performance of the models was evaluated using the testing set and expressed by the AUC, accuracy, sensitivity, and specificity.The six machine learning algorithms that were used to predict SIRS following PCNL used the 13 selected factors as inputs.The performance results of the prediction models are shown in Table 3.The receiver operating characteristics (ROC) and the area under the curve (AUC) for each different prediction models were shown in Fig. 3    Coefficients were used to interpret the results of the best prediction model by evaluating the contribution of each variable to the prediction model in Fig. 4. As shown in Fig. 4, further analysis revealed that prealbumin contributed the most to the prediction of the outcome, followed by preoperative urine culture, systemic immune-inflammation (SII), neutrophil to lymphocyte ratio (NLR), staghorn stones, fibrinogen, operation time, preoperative urine white blood cell (WBC), preoperative urea nitrogen, hydronephrosis, stone burden, sex and preoperative lymphocyte count.

Discussion
PCNL has become the first choice of treatment for multiple or staghorn renal calculi larger than 2 cm [10].Although PCNL has the advantage of less trauma and higher stone removal rate, PCNL still has many complications compared with other minimally invasive stone surgical techniques, particularly postoperative bleeding and postoperative infection.Typical clinical symptoms are insufficient during the initial stages of SIRS, making it difficult to detect post-PCNL SIRS at an early stage.SIRS may cause sepsis or multiple organ failure if untreated promptly.Therefore, we should develop a suitable prediction model based on machine learning algorithms for early detection of SIRS.
With the evolution of statistical theory and computer technology, novel machine learning techniques have improved predictive performance compared with traditional prediction methods.Previous studies have shown that machine learning algorithms can be utilized to predict the occurrence of SIRS in intensive care unit (ICU) or emergency department (ED) patients [11].Kijpaisalratana et al. developed prediction models for early sepsis in the ED [12].Hou et al. developed a model for predicting mortality in intensive care units for patients with sepsis using the XGboost algorithm [13].However, few studies have discussed machine learning-based models for predicting SIRS after PCNL.
To our knowledge, this is the first time that machine learning algorithms have been used to predict post-PCNL SIRS.In this study, we compared the performances of six machine learning algorithms at predicting SIRS after PCNL.We take the following steps to address the potential bias inherent in retrospective studies and single-center data collection: Firstly, our rigorous data collection: ensures data consistency and accuracy to reduce information bias; Secondly, potential confounding factors were controlled through multivariate analysis to We mitigate the risk of model overfitting by the following measures: Firstly, we use the LASSO algorithm for feature selection, which can reduce the model complexity and prevent overfitting by increasing regularization terms.Secondly, the 5-fold cross-validation method is used to randomly divide the dataset several times during the training process and evaluate the model performance.Cross-validation can effectively evaluate the generalization ability of the model and prevent overfitting.Finally, we build the model through a variety of machine learning algorithms (such as support vector machine, K-nearest neighbor, random forest, eXtreme Gradient Boosting, Extra-Trees, LightGBM).By comparing the performance of different models, the model with the best performance can be selected, thus improving the model generalization ability.
In our study, the SVM model outperformed the other prediction models in terms of AUC.The excellent performance of the SVM model is mainly attributed to their good adaptability to small sample and high-dimensional datasets, as well as the effectiveness of their kernel trick in dealing with nonlinear problems.This is particularly important for common data characteristics in the medical field.In addition, the SVM model reduces overfitting by regularizing parameters, ensuring the generalization ability of the model.Therefore, we believe that SVM is a reasonable choice for this study.
We used the weight coefficient to quantify each variable's contribution to the SVM model.Preoperative urine culture, SII, NLR, staghorn stones, fibrinogen, operation time, urea nitrogen, hydronephrosis, preoperative urine WBC, stone burden, and gender were potential risk factors.Previous studies have shown that the SII was a promising prognostic indicator in hepatocellular carcinoma [14], colorectal cancer [15], gastric cancer [16], prostate cancer [17] and post-PCNL SIRS [18].Stones released inflammatory mediators, such as IL-6, IL-7, IL-8, and TNF-α, as well as increase in neutrophils [19].Platelets were rich in pro-inflammatory factors and can release active inflammatory metabolites [20].Excessive inflammatory reactions inhibited the immune responses and reduced the number of lymphocytes, which was associated with an increase in the SII value.SII, an inexpensive and readily available biomarker that can comprehensively reflect the immune status of the host, contributed the most to outcome prediction in our study.In this study, the contribution of NLR to the output of the SVM model was the second most accurate predictor of blood routine after SII.According to Kriplani et al., the NLR was an easily accessible and cost-effective predictor of post-PCNL SIRS [21].As shown in Fig. 4, preoperative lymphocytes served as a protective factor.Jager et al. highlighted that lymphocytopenia was a sign of bacteremia because of accelerated apoptosis [22].Therefore, patients with higher NLR, SII, or lymphocytopenia should receive more consideration.
Similar to previous studies, we found that positive urine culture, positive urine WBC, staghorn stone, stone burden and operation time were key factors associated with SIRS after PCNL [23].Although recent research have shown that renal pelvic urine culture and stone culture can more accurately predict urosepsis than mid-section urine culture, it took a long time to cultivate renal pelvic urine and calculi [24].Therefore, mid-bladder urine culture remained a reliable indicator because it can be easily obtained.Positive urine WBC indicated a urinary infection, which can increase the risk of SIRS after PCNL [25].Therefore, it is necessary to complete routine urine tests and urine culture examinations before PCNL to evaluate the severity of urinary tract infection.Patients with urinary tract infections.need to receive adequate anti-infection treatment before surgery.The operation cannot be performed until the infection is controlled.
It has been reported that staghorn stones may be associated with postoperative sepsis because they harbored colonized bacteria, making it difficult to sterilize urine prior to surgery [26].A larger stone burden made the operation more difficult and prolonged the operation time, which may prolong renal pelvis pressure and increase the risk of endotoxin absorption into the blood [18].Thus, more attention should be paid to patients with staghorn stones, larger stone burden, or prolonged operative time.
Several literature reports have described that fibrinogen and prealbumin are related factors for postoperative SIRS [21].Fibrinogen was a key regulator of inflammation in lung cancer [27], hepatocellular carcinoma [28], colorectal cancer [29] and urological cancers [30].The prealbumin level was a protective predictor in Fig. 4. The prealbumin level reflected the recent nutritional status of a patient and had a positive effect on the prognosis of disease.Therefore, more attention should be paid to highrisk patients with high fibrinogen and low prealbumin levels.
Gender was an independent risk factor for post-PCNL SIRS.This was related to the physiological and anatomical characteristics of the urethra in female patients.Women have a short urethra, and the orifice of the urethra is close to the vagina and anus, which are easily contaminated.
This study had several limitations.First, these machine learning models were trained and developed based on single-center data.Single-center data may have the disadvantage of data bias and small sample size, which will affect the generalization ability and accuracy of the model.Second, all the operations we studied were unilateral and single-channel PCNL, so we were unable to investigate the influence of tract number on post-PCNL SIRS.Finally, we recognize that there may be differences in reference values between different laboratories, and the use of different measurement methods may lead to differences in results, which poses challenges to the accuracy and generalizability of using machine learning models to predict the occurrence of SIRS.The heterogeneity of different laboratory datas may affect the performance of the model, and special attention should be paid when applying the model to different laboratory settings [31].In the future, we intend to carry out a multicenter study and collect data from multicenter to increase the sample size and improve the generalization ability of the model.In addition, we will conduct external validation, which can help identify and address the limitations of the model and improve its reliability and applicability.

Conclusion
We identified relevant factors of post-PCNL SIRS and developed machine learning models to predict the possibility of SIRS after PCNL.We discovered that the SVM model has the potential to enhance data analysis and clinical decision-making capabilities.

Fig. 2 A
Fig.2A The process of feature selection.We used the least absolute shrinkage and selection operator (LASSO) regression model with penalty parameter tuning conducted by fivefold cross validation according to minimum criteria.Selection of the tuning parameter (λ).Based on the minimum criteria, the vertical dotted line is plotted at the optimal value λ = 0.0110.B The vertical line was plotted with 13 selected features

Fig. 3
Fig. 3 Performance for machine learning models based on the AUC of the ROC curve.The SVM model performed the best predictive ability.AUC area under the curve, ROC receiver operating characteristic, SVM support vector machine, KNN k-nearest neighbor, LightGBM light gradient boosting machine, XGBoost eXtreme gradient boosting

Fig. 4
Fig. 4 Top 13 selected features and the corresponding variable coefficients.Y-axis shows the top 13 variables, X-axis shows their impact on the machine model.L lymphocyte, SII Systemic immune-inflammation, NLR Neutrophil to lymphocyte ratio

Table 1
Comparison of clinical factors between patients with and without postoperative SIRS BMI body mass index, WBC white blood cell, N neutrophil, L lymphocyte, M monocyte, PLT platelet, HB hemoglobin, NLR neutrophil to lymphocyte ratio, PLR platelet to lymphocyte ratio, LMR Lymphocyte to monocyte ratio, SII Systemic immune-inflammation

Table 2
Feature selection results and coeffients for each feature SII Systemic immune-inflammation, NLR neutrophil to lymphocyte ratio, WBC white blood cell, L lymphocyte

Table 3
Comparison of the performance of machine learning models in the training and testing set LightGBM, light gradient boosting machine; XGBoost, eXtreme gradient boosting